[Feat(tests)] multi-worker freestyle mock#1430
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughTests for failed-emails digest now poll the dry-run endpoint and filter mailbox-specific batches; the freestyle-mock Dockerfile adds a worker thread script and a queued WorkerPool with npm-install orchestration and safer HTTP request handling. ChangesTest infrastructure and worker pool architecture
Sequence DiagramsequenceDiagram
participant HTTPServer
participant WorkerPool
participant Worker
HTTPServer->>WorkerPool: run(jobConfig)
WorkerPool->>WorkerPool: select idle slot or enqueue
WorkerPool->>Worker: dispatch(job)
Worker->>Worker: apply envVars and capture console and execute script
Worker->>WorkerPool: postMessage {status,result or error,logs}
WorkerPool->>HTTPServer: resolve or reject with {result|error,logs}
🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce e2e flakiness caused by timeouts during email polling by increasing the throughput of the freestyle-mock execution service, and by making the failed-emails digest test poll deterministically instead of using a fixed sleep.
Changes:
- Replace the inline single-threaded freestyle-mock execution model with a
worker_threadspool to run multiple scripts in parallel. - Add a polling helper to
failed-emails-digeste2e tests to wait until expected failures are recorded, removing a fixed sleep.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docker/dependencies/freestyle-mock/Dockerfile | Adds worker.mjs + a worker pool in server.mjs to parallelize script execution and recycle workers periodically. |
| apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts | Adds a digest polling helper and updates the dry-run test to wait for expected failures instead of sleeping. |
Comments suppressed due to low confidence (1)
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts:167
- Indentation in this block is inconsistent (e.g.,
expect(response.status)...and the followingif/elseare indented deeper than the surrounding statements). Since apps/e2e runs ESLint with theindent: ["error", 2, ...]rule, this will likely failpnpm lintfor the e2e package—please reformat this section to match the established 2-space indentation levels.
const { response, batches: mockProjectFailedEmails } = await waitForFailedEmailsDigest(2);
expect(response.status).toBe(200);
if (process.env.STACK_TEST_SOURCE_OF_TRUTH === "true") {
expect(mockProjectFailedEmails).toMatchInlineSnapshot(`[]`);
} else {
expect(mockProjectFailedEmails).toMatchInlineSnapshot(`
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR tackles test flakiness caused by the freestyle mock being unable to service concurrent email-rendering requests. The single-threaded server is replaced with a
Confidence Score: 5/5Safe to merge — the worker pool replaces a well-understood single-threaded bottleneck, all error paths send an HTTP response, and previously flagged concerns are now addressed. The Dockerfile changes are a self-contained testing mock with no production data paths; the multi-worker logic is well-commented and correctly handles worker recycling, timeouts, and error/exit races. The test change removes a fragile fixed delay in favour of a polling helper that throws a clear diagnostic on timeout. No files require special attention beyond the minor polling-condition note in the test helper. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as HTTP Client
participant Server as server.mjs
participant Pool as WorkerPool
participant Worker as worker.mjs (thread)
participant FS as Filesystem
Client->>Server: POST /execute/v2/script
Server->>FS: mkdir(workDir), writeFile(script.mjs), writeFile(package.json)
alt needs extra modules
Server->>Server: runNpmInstall(workDir) [60s timeout]
end
Server->>Pool: "pool.run({ scriptFile, envVars }) [30s timeout]"
alt idle slot available
Pool->>Worker: "postMessage({ scriptFile, envVars })"
Worker->>FS: import file://scriptFile
Worker->>Worker: execute, capture logs
Worker->>Pool: "postMessage({ status, result/error, logs })"
else all slots busy
Pool->>Pool: enqueue job, wait for slot
end
Pool->>Server: resolve/reject
Server->>Client: 200/500 JSON response
Server->>FS: rm(workDir) [finally]
note over Pool,Worker: After MAX_JOBS_PER_WORKER=50 jobs, slot.draining=true, terminate, exit, spawnSlot()
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts:53
The polling exit condition only inspects `batches[0]?.emails.length`, but `selectBatchesForCurrentMailbox` can return multiple batches when the same owner mailbox has failed emails across more than one project (e.g. across the 10 repeats of the outer test). In that case the loop exits as soon as *any* first batch reaches the threshold — which may not be the current test's project — and the returned `batches` array then contains extra entries, causing the inline snapshot assertion to fail with a confusing multi-batch diff instead of a clear "wrong project" error. Scoping the check to the total across all current-mailbox batches would avoid this.
```suggestion
while (batches.reduce((sum, b) => sum + b.emails.length, 0) < expectedFailedEmailCount) {
```
### Issue 2 of 2
docker/dependencies/freestyle-mock/Dockerfile:57
**Stale console patch leaks between jobs on uncaught async rejection**
The `finally` block restores `console[method]` only when the `try` block resolves or throws synchronously. If user code spawns a detached `Promise` that rejects *after* `parentPort.postMessage(response)` fires (i.e. after the `finally` block has already run and the next job has started patching `console`), that rejection will execute inside the next job's patched console — so the next job's `logs` array picks up log lines that belong to the previous job. This is unlikely in practice for email-rendering scripts, but the issue exists structurally because the `parentPort.on("message", async …)` handler is one long async function and a stale floating promise can outlive the `finally`.
Reviews (2): Last reviewed commit: "feat(tests): multi-worker freestyle mock" | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts (1)
38-61: 💤 Low valueConsider clarifying the single-batch assumption.
Line 53 checks only
batches[0]?.emails.length, which assumes at most one batch per mailbox will be returned. While this appears correct given the test expectations (lines 167-188 show a single batch), the function comment could explicitly state this assumption for future maintainers.📝 Optional comment enhancement
// Polls the failed-emails-digest until the current mailbox's batch contains // at least `expectedFailedEmailCount` emails, or `timeoutMs` elapses. Replaces // a fixed sleep that flaked when the SMTP failure pipeline (DNS lookup + // connection attempt + outbox status update) hadn't recorded all failures yet. +// Note: Checks only the first batch, assuming one batch per mailbox owner. async function waitForFailedEmailsDigest(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts` around lines 38 - 61, Update the comment above waitForFailedEmailsDigest to explicitly state the single-batch assumption: note that the loop checks batches[0]?.emails.length and therefore expects at most one batch per mailbox (as used by selectBatchesForCurrentMailbox and the tests). Mention that if multiple batches could be returned in the future the logic should be revised (e.g., aggregate across batches or handle multiple indices) so maintainers know why batches[0] is used.docker/dependencies/freestyle-mock/Dockerfile (1)
38-43: ⚡ Quick winLog capture stringifies objects to
"[object Object]".
args.map(String).join(" ")collapses any non-primitive argument (objects, errors, arrays) to[object Object]/Error: ...with no fields, which makes the capturedlogssubstantially less useful when tests assert on script output. Node's default console usesutil.format, which preserves this detail and matches behavior of the un-patched console.♻️ Proposed fix
import { parentPort } from "worker_threads"; +import { format } from "util"; @@ logMethods.forEach((method) => { console[method] = (...args) => { - logs.push({ message: args.map(String).join(" "), type: method }); + logs.push({ message: format(...args), type: method }); originalConsole[method](...args); }; });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/dependencies/freestyle-mock/Dockerfile` around lines 38 - 43, The captured logs currently stringify arguments with args.map(String).join(" "), losing object/ Error detail; change the console patch inside the logMethods.forEach to use Node's util.format so captured messages match the unpatched console: require or import util.format (e.g. const { format } = require('util') or import { format } from 'util') and replace args.map(String).join(" ") with format(...args) when calling logs.push({ message: ..., type: method }), keeping the originalConsole[method](...args) call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 136-181: The `failInFlight` path clears slot.busy but doesn't mark
the slot draining, so a concurrent dispatch can claim a worker that's about to
exit; update the code so that `slot.draining = true` is set when a worker fails
(either inside `failInFlight` or immediately in the `worker.on("error", ...)`
handler before calling `failInFlight`), ensuring `dispatchNext()` will skip the
slot until the `exit` handler replaces it; refer to the `failInFlight` function,
the `worker.on("error", (err) => { ... })` handler, and the `slot.draining` flag
when making the change.
- Around line 258-261: The call to pool.run({ scriptFile, envVars }) can hang
forever if the user script never posts a message, so wrap/race the pool.run
promise with a timeout and on timeout forcefully terminate the associated worker
so the worker 'exit' handler can respawn it and clear slot.busy; specifically,
implement a timedRace around pool.run that rejects after a configurable TIMEOUT,
and on that rejection call the worker termination method used by the pool (so
the exit/cleanup logic that checks MAX_JOBS_PER_WORKER and clears slot.busy
runs), ensuring pool.run's rejection path also marks the slot free and surfaces
a clear timeout error to the caller.
- Around line 244-256: The npm install spawn block can hang because it lacks an
'error' listener, doesn't consume stdio pipes, and has no timeout; update the
logic around spawn("npm", ["install"], { cwd: workDir, stdio: "pipe" }) (used
when needsInstall && Object.keys(requestedNodeModules).length) to add
installProcess.on("error", ...) to reject immediately on spawn failures, pipe or
forward installProcess.stdout and installProcess.stderr to a logger or to
process.stdout/stderr to avoid filling OS buffers, implement a configurable
timeout timer that kills installProcess and rejects if install takes too long,
and ensure you clean up listeners and timer in the installProcess.on("close",
...) handler so the promise always resolves or rejects and doesn't leak
resources.
---
Nitpick comments:
In
`@apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts`:
- Around line 38-61: Update the comment above waitForFailedEmailsDigest to
explicitly state the single-batch assumption: note that the loop checks
batches[0]?.emails.length and therefore expects at most one batch per mailbox
(as used by selectBatchesForCurrentMailbox and the tests). Mention that if
multiple batches could be returned in the future the logic should be revised
(e.g., aggregate across batches or handle multiple indices) so maintainers know
why batches[0] is used.
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 38-43: The captured logs currently stringify arguments with
args.map(String).join(" "), losing object/ Error detail; change the console
patch inside the logMethods.forEach to use Node's util.format so captured
messages match the unpatched console: require or import util.format (e.g. const
{ format } = require('util') or import { format } from 'util') and replace
args.map(String).join(" ") with format(...args) when calling logs.push({
message: ..., type: method }), keeping the originalConsole[method](...args) call
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d6ad83eb-3f40-47f5-97dc-08abff6e76de
📒 Files selected for processing (2)
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.tsdocker/dependencies/freestyle-mock/Dockerfile
0187a5d to
39db261
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docker/dependencies/freestyle-mock/Dockerfile (1)
45-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore the full worker environment after each job.
This only snapshots keys from
envVars. If a script mutates or deletes any otherprocess.enventry, that state leaks into later requests handled by the same recycled worker, which can reintroduce cross-test flakiness.🛡️ Proposed fix
- const previousEnv = new Map(); - Object.entries(envVars || {}).forEach(([key, value]) => { - previousEnv.set(key, process.env[key]); + const previousEnv = { ...process.env }; + Object.entries(envVars || {}).forEach(([key, value]) => { process.env[key] = value; }); @@ - previousEnv.forEach((value, key) => { - if (value === undefined) { - delete process.env[key]; - } else { - process.env[key] = value; - } - }); + Object.keys(process.env).forEach((key) => { + if (!(key in previousEnv)) delete process.env[key]; + }); + Object.entries(previousEnv).forEach(([key, value]) => { + process.env[key] = value; + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/dependencies/freestyle-mock/Dockerfile` around lines 45 - 66, The current snapshot logic only saves keys listed in envVars (previousEnv) so any mutations/deletions to other process.env entries by the user code leak across recycled workers; instead, capture a full copy of process.env before importing/executing the user script (e.g., create a complete snapshot object of process.env before the userModule import/execute around scriptFile/userFunction) and in the finally block fully restore the environment by removing keys added during execution and restoring original values from that full snapshot (clear current process.env then reassign original keys), ensuring the worker's environment is identical after each job.
🧹 Nitpick comments (1)
docker/dependencies/freestyle-mock/Dockerfile (1)
87-103: ⚡ Quick winUse container-aware parallelism for
POOL_SIZE.
cpus().lengthreturns the host system's total logical cores regardless of cgroup limits, which can cause worker pool oversubscription in CI containers. Node.js 22 providesavailableParallelism()for this—it respects Docker--cpus, Kubernetes cpu limits, and cgroup constraints (both v1 and v2).♻️ Proposed fix
-import { cpus } from "os"; +import { availableParallelism } from "os"; @@ -const POOL_SIZE = Math.max(2, cpus().length); +const POOL_SIZE = Math.max(2, availableParallelism());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/dependencies/freestyle-mock/Dockerfile` around lines 87 - 103, The POOL_SIZE calculation uses cpus().length which ignores cgroup limits; change it to prefer os.availableParallelism() when present and fall back to cpus().length, then clamp with Math.max(2, ...). Update the code that defines POOL_SIZE (the POOL_SIZE constant) to call availableParallelism() safely (e.g., typeof availableParallelism === "function" ? availableParallelism() : cpus().length) and keep the existing Math.max(2, ...) clamp so the pool remains at least 2; reference the os import and the POOL_SIZE constant to locate where to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 45-66: The current snapshot logic only saves keys listed in
envVars (previousEnv) so any mutations/deletions to other process.env entries by
the user code leak across recycled workers; instead, capture a full copy of
process.env before importing/executing the user script (e.g., create a complete
snapshot object of process.env before the userModule import/execute around
scriptFile/userFunction) and in the finally block fully restore the environment
by removing keys added during execution and restoring original values from that
full snapshot (clear current process.env then reassign original keys), ensuring
the worker's environment is identical after each job.
---
Nitpick comments:
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 87-103: The POOL_SIZE calculation uses cpus().length which ignores
cgroup limits; change it to prefer os.availableParallelism() when present and
fall back to cpus().length, then clamp with Math.max(2, ...). Update the code
that defines POOL_SIZE (the POOL_SIZE constant) to call availableParallelism()
safely (e.g., typeof availableParallelism === "function" ?
availableParallelism() : cpus().length) and keep the existing Math.max(2, ...)
clamp so the pool remains at least 2; reference the os import and the POOL_SIZE
constant to locate where to modify.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3d7b77fb-7493-469d-b369-d95928ec4329
📒 Files selected for processing (2)
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.tsdocker/dependencies/freestyle-mock/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts
Lots of flakiness comes from email polling leading to timeouts. This usually happens when freestyle mock cannot service requests in time. Old mock was single threaded and so clogged up by a lot of requests. A multiworker system should be better.
39db261 to
30853e8
Compare
|
@greptileai could you please rereview this PR? Do an exhaustive review pass over the changes. |
signing up with an email is well tested elsewhere. Having the assertion here doesn't actually give us any extra coverage, but it does increase the load on the mock
…1432) ## Summary The multi-worker freestyle-mock rewrite ([#1430](#1430)) hardcoded `server.listen(8080)`, which collides with qstash inside the local-emulator container. Supervisord sets `PORT=8180` for freestyle-mock specifically to avoid this clash, but the new source ignores `process.env.PORT`. The local-emulator Dockerfile previously bridged this with a `server.replace('server.listen(8080)', ...)` string-patch on the embedded source. The new code is `server.listen(8080, () => { ... })` — the literal `'server.listen(8080)'` substring no longer matches, so the replace silently no-ops and freestyle-mock binds 8080. qstash then can't start (`address already in use: 127.0.0.1:8080` → FATAL), the backend (which depends on qstash) never comes up, and the emulator smoke test times out. Observed in [this run](https://github.com/hexclave/stack-auth/actions/runs/25832479377): ``` smoke-test: FTL address already in use: 127.0.0.1:8080 smoke-test: WARN exited: qstash (exit status 1; not expected) smoke-test: INFO gave up: qstash entered FATAL state, too many start retries too quickly [603s] SMOKE TEST FAILED: backend /health?db=1 did not return 200 within 300s ``` ## Changes - `docker/dependencies/freestyle-mock/Dockerfile`: `server.listen(PORT)` where `PORT = process.env.PORT || 8080`, plus the startup log reflects the actual port. - `docker/local-emulator/Dockerfile`: drop the now-redundant string-replace for the listen call. The two remaining replaces (`fs/promises` import + node_modules symlink) are unrelated and kept. ## Test plan - [ ] QEMU emulator build workflow passes on this branch (smoke test reaches healthy backend). - [ ] Verify locally that supervisord's `PORT=8180` is honored by freestyle-mock and qstash binds 8080 cleanly. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Server listening port is now configurable via PORT (default 8080). * Local emulator startup adjusted to better handle dependencies and create a node_modules symlink for smoother local runs. * Seed/process transaction timeout increased to 90s for reliability. * Local database statement timeout changed to 0 (no statement timeout). * **CI** * Added step to enable and validate KVM access during emulator builds. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1432) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Context
Lots of flakiness comes from email polling leading to timeouts. This usually happens when freestyle mock cannot service requests in time. Old mock was single threaded and so clogged up by a lot of requests.
Summary of Changes
A multiworker system should be better.
Summary by CodeRabbit
Tests
Chores